首页> 外文OA文献 >Searching for poor quality machine translated text : learning the difference between human writing and machine translations
【2h】

Searching for poor quality machine translated text : learning the difference between human writing and machine translations

机译:搜索质量差的机器翻译文本:了解人工写作和机器翻译之间的区别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As machine translation (MT) tools have become mainstream, machine translated text has increasingly appeared on multilingual websites. Trustworthy multilingual websites are used as training corpora for statistical machine translation tools; large amounts of MT text in training data may make such products less effective. We performed three experiments to determine whether a support vector machine (SVM) could distinguish machine translated text from human written text (both original text and human translations). Machine translated versions of the Canadian Hansard were detected with an F-measure of 0.999. Machine translated versions of six Government of Canada web sites were detected with an F-measure of 0.98.We validated these results with a decision tree classifier. An experiment to find MT text on Government of Ontario web sites using Government of Canada training data was unfruitful, with a high rate of false positives. Machine translated text appears to be learnable and detectable when using a similar training corpus.
机译:随着机器翻译(MT)工具已成为主流,机器翻译文本越来越多地出现在多语言网站上。值得信赖的多语言网站被用作统计机器翻译工具的训练语料库;训练数据中大量的MT文本可能会使此类产品的效果降低。我们进行了三个实验,以确定支持向量机(SVM)是否可以将机器翻译的文本与人类书面的文本(原始文本和人类翻译)区分开。检测到加拿大议事录的机器翻译版本,其F值为0.999。检测到六个加拿大政府网站的机器翻译版本,其F值为0.98。我们使用决策树分类器对这些结果进行了验证。使用加拿大政府培训数据在安大略省政府网站上查找MT文本的实验是徒劳的,误报率很高。使用类似的训练语料库时,机器翻译的文本似乎是可学习和可检测的。

著录项

  • 作者

    Carter, Dave; Inkpen, Diana;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号